Search CORE

139 research outputs found

Evolutionarily Conserved Substrate Substructures for Automated Annotation of Enzyme Superfamilies

Author: A Aharoni
AE Todd
AG Murzin
Andrej Sali
AS Mildvan
C Kalyanaraman
C Steinbeck
CM Seibert
CS Riesenfeld
CT Porter
D Weininger
DJ Weininger
DM Schmidt
DM Schmidt
GL Holliday
HM Holden
I Friedberg
I Nobeli
I Schomburg
I Shah
J Barthelmes
JA Gerlt
JA Gerlt
JA Gerlt
JA Gerlt
JC Hermann
JC Hermann
JJ Diaz-Mejia
K Tipton
KA Frazer
KN Allen
L Holm
L Song
M Ashburner
M Bashton
M Kotera
MA Marti-Renom
ME Glasner
ME Glasner
MJ Bessman
MJ Keiser
N Nagano
NH Horowitz
NH Horowitz
NM O'Boyle
Patricia C. Babbitt
PC Babbitt
PC Babbitt
PC Babbitt
R Alves
RA Nagatani
Ranyee A. Chiang
Robert B. Russell
S Light
S Schmidt
SC Pegg
SC Rison
SD Copley
TL O'Loughlin
WR Pearson
Publication venue: Public Library of Science
Publication date: 01/08/2008
Field of study

The evolution of enzymes affects how well a species can adapt to new environmental conditions. During enzyme evolution, certain aspects of molecular function are conserved while other aspects can vary. Aspects of function that are more difficult to change or that need to be reused in multiple contexts are often conserved, while those that vary may indicate functions that are more easily changed or that are no longer required. In analogy to the study of conservation patterns in enzyme sequences and structures, we have examined the patterns of conservation and variation in enzyme function by analyzing graph isomorphisms among enzyme substrates of a large number of enzyme superfamilies. This systematic analysis of substrate substructures establishes the conservation patterns that typify individual superfamilies. Specifically, we determined the chemical substructures that are conserved among all known substrates of a superfamily and the substructures that are reacting in these substrates and then examined the relationship between the two. Across the 42 superfamilies that were analyzed, substantial variation was found in how much of the conserved substructure is reacting, suggesting that superfamilies may not be easily grouped into discrete and separable categories. Instead, our results suggest that many superfamilies may need to be treated individually for analyses of evolution, function prediction, and guiding enzyme engineering strategies. Annotating superfamilies with these conserved and reacting substructure patterns provides information that is orthogonal to information provided by studies of conservation in superfamily sequences and structures, thereby improving the precision with which we can predict the functions of enzymes of unknown function and direct studies in enzyme engineering. Because the method is automated, it is suitable for large-scale characterization and comparison of fundamental functional capabilities of both characterized and uncharacterized enzyme superfamilies

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

βα-Hairpin Clamps Brace βαβ Modules and Can Make Substantive Contributions to the Stability of TIM Barrel Proteins

Author: A Horovitz
A Radzicka
A Sali
AG Murzin
Andreas Hofmann
B Ibarra-Molero
B Schneider
C Branden
C Jurgens
C. Robert Matthews
CA Rohl
CC Hyde
CR Matthews
CR Matthews
D Rothlisberger
DF Stickle
EN Baker
F FarzadFard
F Yates
HM Berman
IK McDonald
J Gao
JA Gerlt
JA Gerlt
JA Zitzewitz
JK Myers
K Mizuguchi
L Serrano
LG Presta
M Brylinski
M Gerstein
M Wilmanns
MM Altamirano
MM Gromiha
MM Sanchez del Pino
N Nagano
O Bilsel
R Aurora
R Vadrevu
Ramakrishna Vadrevu
RK Wierenga
Sagar V. Kathuria
SJ Wieczorek
V Kunin
W Kabsch
WL Delano
WR Forsyth
X Yang
Xiaoyan Yang
Y Wu
Z Gu
ZM Frenkel
Publication venue: Public Library of Science
Publication date: 01/09/2009
Field of study

Non-local hydrogen bonding interactions between main chain amide hydrogen atoms and polar side chain acceptors that bracket consecutive βα or αβ elements of secondary structure in αTS from E. coli, a TIM barrel protein, have previously been found to contribute 4–6 kcal mol−1 to the stability of the native conformation. Experimental analysis of similar βα-hairpin clamps in a homologous pair of TIM barrel proteins of low sequence identity, IGPS from S. solfataricus and E. coli, reveals that this dramatic enhancement of stability is not unique to αTS. A survey of 71 TIM barrel proteins demonstrates a 4-fold symmetry for the placement of βα-hairpin clamps, bracing the fundamental βαβ building block and defining its register in the (βα)8 motif. The preferred sequences and locations of βα-hairpin clamps will enhance structure prediction algorithms and provide a strategy for engineering stability in TIM barrel proteins

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship@UMMS

Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies

Author: A Andreeva
A Sakai
A Weeks
AE Todd
Andrej Sali
C Nowlan
CH Wu
CM Seibert
D Vitkup
DA Benson
DL Wheeler
EF Pettersen
F Melo
Frank M. Raushel
H Berman
HJ Imker
J Akana
J Gough
J Lee
J. Michael Sauder
JA Gerlt
JA Gerlt
JA Gerlt
JB Bonanno
JB Thoden
JC Hermann
JC Hermann
JC Norvell
JC Venter
JE Vick
JE Vick
Jeffrey B. Bonanno
Jennifer J. Seffernick
JF Rakus
JJ Irwin
John A. Gerlt
L Holm
L Song
L Williams
Libusha Kelly
Margaret E. Glasner
Mark R. Chance
Matthew P. Jacobson
ME Glasner
ME Glasner
ME Glasner
N Eswar
N Nagano
Narayanan Eswar
P Shannon
Patricia C. Babbitt
PC Babbitt
R Marti-Arbona
R Marti-Arbona
R Marti-Arbona
R Sanchez
R Tyagi
Ranyee Chiang
RS Hall
RZ Liao
SC Almo
SC Pegg
SD Brown
SF Altschul
Shoshana D. Brown
SL Schafer
Stephen K. Burley
Steven C. Almo
Subramanyam Swaminathan
TN Porter
TT Nguyen
U Pieper
Ursula Pieper
WS Yew
WS Yew
WS Yew
Xiaojing Zheng
Y Li
Publication venue: Springer Netherlands
Publication date: 01/01/2009
Field of study

To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics (NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date, 20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence–structure–function relationships

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

FLORA: a novel method to predict protein function from structure in diverse superfamilies

Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation

Author: A Andreeva
A Stark
A Stark
BW Matthews
CJ Sigrist
CT Porter
E Krissinel
ED Scheeff
G Ausiello
GJ Kleywegt
Gong-Hua Li
H Ago
HM Berman
I Boltes
IN Shindyalov
JA Barker
JA Gerlt
JC Lagarias
Jing-Fei Huang
JS Fetrow
JW Torrance
K Kinoshita
L Holm
P Chen
PF Gherardini
RA Laskowski
RD Finn
RV Spriggs
S Schmitt
SF Altschul
SF Altschul
T Fawcett
T Madej
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA). Conclusions The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server (<url>http://159.226.149.45/other1/CMASA/CMASA.htm</url>).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CORRIE: enzyme sequence annotation with confidence estimates

Author: A Bairoch
AM Leontovich
Benjamin Audit
CA Ouzounis
CA Wilson
CH Wu
Christos A Ouzounis
D Devos
EA Bayer
ED Levy
Emmanuel D Levy
F Abascal
FD Schubot
G Casari
H Weiss
JA Gerlt
JL Ong
Leon Goldovsky
M des Jardins
MA Andrade
NC Kyrpides
O Lichtarge
P Bork
PD Karp
SF Altschul
VJ Promponas
Wally R Gilks
WG Krebs
WR Gilks
Y Zhang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at:

HAL-ENS-LYON

Crossref

Springer - Publisher Connector

PubMed Central

Retrieving sequences of enzymes experimentally characterized but erroneously annotated : the case of the putrescine carbamoyltransferase

Author: A Bairoch
A Sekowska
B Barcelona-Andres
B Labedan
B Labedan
B Wargnies
C Tricot
C Vander Wauven
GH Gonnet
I Paulsen
I Schomburg
J Felsenstein
JA Gerlt
JP Simon
L Grivell
M Kanehisa
M Zuniga
PC Babbitt
PD Karp
R Apweiler
R Cunin
RJ Roon
S Dashuang
SE Brenner
T Janowitz
TA Hall
The Gene Ontology Consortium
V Stalon
Y Nakada
Y Nakada
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Annotating genomes remains an hazardous task. Mistakes or gaps in such a complex process may occur when relevant knowledge is ignored, whether lost, forgotten or overlooked. This paper exemplifies an approach which could help to ressucitate such meaningful data. RESULTS: We show that a set of closely related sequences which have been annotated as ornithine carbamoyltransferases are actually putrescine carbamoyltransferases. This demonstration is based on the following points : (i) use of enzymatic data which had been overlooked, (ii) rediscovery of a short NH(2)-terminal sequence allowing to reannotate a wrongly annotated ornithine carbamoyltransferase as a putrescine carbamoyltransferase, (iii) identification of conserved motifs allowing to distinguish unambiguously between the two kinds of carbamoyltransferases, and (iv) comparative study of the gene context of these different sequences. CONCLUSIONS: We explain why this specific case of misannotation had not yet been described and draw attention to the fact that analogous instances must be rather frequent. We urge to be especially cautious when high sequence similarity is coupled with an apparent lack of biochemical information. Moreover, from the point of view of genome annotation, proteins which have been studied experimentally but are not correlated with sequence data in current databases qualify as "orphans", just as unassigned genomic open reading frames do. The strategy we used in this paper to bridge such gaps in knowledge could work whenever it is possible to collect a body of facts about experimental data, homology, unnoticed sequence data, and accurate informations about gene context

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DI-fusion

Quantitative sequence-function relationships in proteins based on gene ontology

Author: A Bairoch
A Bairoch
A Bateman
A Bateman
A Conesa
AE Todd
Arthur M Lesk
CA Wilson
CZ Cai
D Devos
D Devos
Daniel J Blankenberg
E Camon
EL Sonnhammer
J Piatigorsky
JA Gerlt
JA Ranea
JC Whisstock
K Fleming
L Holm
LB Koski
LJ Jensen
M Ashburner
M Shadidy
MA Andrade
MD Ganfornina
N Hulo
Naomi Altman
P Bork
R Karp
RA Laskowski
RA Laskowski
RC Edgar
S Jones
S Nakayama
SB Needleman
SE Brenner
SF Altschul
SR Eddy
SS Jeong
T Doerks
TF Smith
TK Attwood
Vineet Sangar
X Lu
Publication venue: BioMed Central
Publication date: 01/08/2007
Field of study

Abstract Background The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology. Results We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero. Conclusion Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.</p

Crossref

Directory of Open Access Journals

PubMed Central

The FGGY carbohydrate kinase family : insights into the evolution of functional specificities

Author: A Osterman
A Vendeville
Adam Godzik
AE Todd
AE Todd
AM Schnoes
Andrei Osterman
B Reva
BE Engelhardt
BG Magor
CA Bonner
CA Orengo
Christos A. Ouzounis
CM Seibert
D Grueninger
D Wu
DA Lee
DA Rodionov
E Di Luccio
G Casari
GE Crooks
HM Berman
I Letunic
Irina Rodionova
JA Capra
JA Capra
JA Gerlt
JH Hurley
JH Hurley
JI Yeh
K Sjolander
K Ye
KB Xavier
LA David
M Ormo
M Pachkov
ME Glasner
MN Price
MV Omelchenko
N Krishnamurthy
Olga Zagnitko
OV Kalinina
P Shannon
R Overbeek
RC Edgar
RC Edgar
RD Finn
RK Aziz
S Cheek
SS Hannenhalli
TA Tatusova
TT Nguyen
W-D Fessner
Y Zhang
Ying Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/12/2011
Field of study

© The Author(s), 2011. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS Computational Biology 7 (2011): e1002318, doi:10.1371/journal.pcbi.1002318.Function diversification in large protein families is a major mechanism driving expansion of cellular networks, providing organisms with new metabolic capabilities and thus adding to their evolutionary success. However, our understanding of the evolutionary mechanisms of functional diversity in such families is very limited, which, among many other reasons, is due to the lack of functionally well-characterized sets of proteins. Here, using the FGGY carbohydrate kinase family as an example, we built a confidently annotated reference set (CARS) of proteins by propagating experimentally verified functional assignments to a limited number of homologous proteins that are supported by their genomic and functional contexts. Then, we analyzed, on both the phylogenetic and the molecular levels, the evolution of different functional specificities in this family. The results show that the different functions (substrate specificities) encoded by FGGY kinases have emerged only once in the evolutionary history following an apparently simple divergent evolutionary model. At the same time, on the molecular level, one isofunctional group (L-ribulokinase, AraB) evolved at least two independent solutions that employed distinct specificity-determining residues for the recognition of a same substrate (L-ribulose). Our analysis provides a detailed model of the evolution of the FGGY kinase family. It also shows that only combined molecular and phylogenetic approaches can help reconstruct a full picture of functional diversifications in such diverse families.This study was funded by NIH and DOE grants

Public Library of Science (PLOS)

Crossref

Woods Hole Open Access Server

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Gene fusions and gene duplications: relevance to genomic annotation and functional analysis

Author: A Bateman
A Dautry-Varsat
A Maruya
B Labedan
C Vogel
CF Higgins
F Titgemeyer
GH Gonnet
GH Gonnet
GH Thomas
H Salgado
I Saint-Girons
IP Crawford
J Gough
JA Gerlt
JD Glasner
K Fukami-Kobayashi
LA Nahum
M El Ghachi
M Madera
M Riley
M Riley
MH Serres
MH Serres
MY Galperin
NB Vartak
P Liang
P Liang
PD Karp
PJ Piggot
R Jaggi
RM Schwartz
RR Chaudhuri
S Sundararaj
SB Needleman
SF Altschul
SY Yang
TF Smith
WR Gilks
Y Fujita
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. RESULTS: Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused) proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. CONCLUSION: The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes it possible to generate protein groups related by both sequence and function, avoiding mixing of unrelated sequences. Organisms differ in sizes of groups of sequence-related proteins. A sample comparison of orthologs to selected E. coli paralogous groups correlates with known physiological and taxonomic relationships between the organisms

Crossref

Woods Hole Open Access Server

Springer - Publisher Connector

PubMed Central